Supplementary Materials to Adaptive and Transparent Cache Bypassing for GPUs
نویسندگان
چکیده
ABSTRACT This document is the supplementary supporting file to the corresponding SC-15 conference paper titled Adaptive and Transparent Cache Bypassing for GPUs. In this document, we first show the experiment figures for the four extra GPU platforms that cannot fit into the original paper due to page limitation. We then show the simulation results for the hardware approach that attempts to reduce bypass overhead. Finally, we analyze the performance patterns of the applications with respect to different bypassing threshold, which may explain why certain applications can benefit significantly from cache bypassing than others.
منابع مشابه
A Survey of Cache Bypassing Techniques
With increasing core-count, the cache demand of modern processors has also increased. However, due to strict area/power budgets and presence of poor data-locality workloads, blindly scaling cache capacity is both infeasible and ineffective. Cache bypassing is a promising technique to increase effective cache capacity without incurring power/area costs of a larger sized cache. However, injudicio...
متن کاملA Dueling Segmented LRU Replacement Algorithm with Adaptive Bypassing
In this paper we present a high performance cache replacement algorithm called Dueling Segmented LRU replacement algorithm with adaptive Bypassing (DSB). The base algorithm is Segmented LRU (SLRU) replacement algorithm originally proposed for disk cache management. We introduce three enhancements to the base SLRU algorithm. First, a newly allocated line could be randomly promoted for better pro...
متن کاملThe Demand for a Sound Baseline in GPU Memory Architecture Research
Modern GPUs adopt massive multithreading and multi-level cache hierarchies to hide long operation latencies, especially off-chip memory access latencies. However, poor cache indexing and cache line allocation policy as well as a small number of miss-status handling registers (MSHRs) can exacerbate the problem of cache thrashing and cache-missrelated resource congestion. Besides, modulo address ...
متن کاملImproving Multi-Application Concurrency Support Within the GPU Memory System
GPUs exploit a high degree of thread-level parallelism to efficiently hide long-latency stalls. Thanks to their latencyhiding abilities and continued improvements in programmability, GPUs are becoming a more essential computational resource. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-sca...
متن کاملBypassing the lack of reactivity of endo-substituted norbornenes with the catalytic rectification–insertion mechanism† †Electronic supplementary information (ESI): Experimental procedures, NMR characterization, kinetic plots, ORTEP diagrams and cif files. CCDC 1034345–1034348 and 1034422. For ESI and crystallographic data in CIF or other electronic format see DOI: 10.1039/c4sc03575e
متن کامل
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015